Skip to content

fix(runtime, channels): unify session backend behind one factory#6384

Merged
theonlyhennygod merged 4 commits intozeroclaw-labs:masterfrom
singlerider:fix/5769-session-tools-sqlite-backend
May 6, 2026
Merged

fix(runtime, channels): unify session backend behind one factory#6384
theonlyhennygod merged 4 commits intozeroclaw-labs:masterfrom
singlerider:fix/5769-session-tools-sqlite-backend

Conversation

@singlerider
Copy link
Copy Markdown
Collaborator

@singlerider singlerider commented May 5, 2026

Summary

The session tools (sessions_list, sessions_history, sessions_send, sessions_current) were registered against the JSONL SessionStore, but the gateway WebSocket handler persisted sessions to SqliteSessionBackend. Any session created via /ws/chat was therefore invisible to the agent's own session tools (sessions_list returned "No active sessions found" for valid gateway session IDs, and the agent then hallucinated session IDs).

Naively swapping the tool registration to SQLite would have traded one half-blindness for another: the channel orchestrator was opening JSONL too, so channel sessions (Telegram, Slack, Discord, ...) would have stopped showing up in the tools.

Introduce zeroclaw_infra::make_session_backend(workspace_dir, &str) that picks JSONL or SQLite based on the existing [channels].session_backend config field (which the schema already exposed; nothing was reading it). All three call sites — the channel orchestrator, the runtime tool registration, and the gateway /ws/chat session store — now go through this factory. The configured backend becomes the single source of truth across channel, gateway, and tool reads.

The orchestrator's session_store field switches from Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>>. The mtime-based hydration sort is replaced with the trait's list_sessions_with_metadata().last_activity field; SessionStore overrides the trait default to fill last_activity from session_mtime so JSONL hydration ordering is preserved.

When the SQLite backend opens for the first time and legacy sessions/*.jsonl files exist on disk, make_session_backend calls SqliteSessionBackend::migrate_from_jsonl to import them and renames each source file to *.jsonl.migrated so a rollback is recoverable. Migration errors are logged and skipped; the backend still opens.

Closes #5769. Supersedes #5770 — the original one-line tool-registration fix is folded into the broader factory in this PR.

Validation Evidence

$ cargo fmt --all -- --check
# (clean)

$ cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
# Finished `dev` profile [unoptimized + debuginfo] target(s)

$ cargo test -p zeroclaw-infra --lib
running 62 tests
... 4 new factory tests pass:
test tests::make_session_backend_jsonl_round_trips_through_session_store ... ok
test tests::make_session_backend_sqlite_round_trips_through_sqlite_db ... ok
test tests::make_session_backend_unknown_value_falls_back_to_sqlite ... ok
test tests::make_session_backend_sqlite_imports_legacy_jsonl_on_first_open ... ok
test result: ok. 62 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

$ cargo test --workspace --exclude zeroclaw-desktop --features ci-all --lib
# all per-crate suites passing

Security & Privacy Impact

  • New permissions, capabilities, or filesystem access scope? No.
  • New external network calls? No.
  • Secrets / tokens / credentials handling changed? No.
  • PII or real identities in diff, tests, fixtures, or docs? No.

Compatibility

  • New deployments and operators on the default config: seamless. session_backend = "sqlite" (the default) opens the SQLite store; channel, gateway, and tool reads share it.
  • Upgrading deployments with existing JSONL channel sessions: on first open under the SQLite default, make_session_backend imports sessions/*.jsonl into sessions/sessions.db and renames each source file to *.jsonl.migrated. Sessions remain visible to the channel orchestrator and to the new tools after upgrade. *.jsonl.migrated files stay on disk so an operator who needs to roll back can rename them back.
  • Operators on session_backend = "jsonl": unchanged behaviour. JSONL files keep their existing read/write semantics; the gateway WS path now also reads/writes JSONL, closing the original Session tools (sessions_list/sessions_history) use JSONL backend instead of SQLite — returns empty for gateway sessions #5769 split for that backend choice.
  • Config / env / CLI surface changed? No new fields. The existing [channels].session_backend is now load-bearing for all three call sites — channels, gateway, and runtime tools — instead of being mostly ignored.

Rollback

git revert <merge-sha> returns to the pre-PR shape. Imported sessions remain readable in the SQLite db; the renamed *.jsonl.migrated files can be renamed back to *.jsonl if the operator wants to fall back to the JSONL backend.

Co-authored-by: Lior-Dav Lior-Dav@users.noreply.github.com

The gateway WebSocket handler persists sessions to SqliteSessionBackend
(sessions/sessions.db, with gw_-prefixed keys) and the gateway REST API
reads from the same place. The session tools (sessions_list,
sessions_history, sessions_send) were initialized with the JSONL
SessionStore, so they looked at sessions/*.jsonl files the gateway
never writes.

Result: gateway sessions are invisible to the agent. sessions_list
returns "No active sessions found" even when the SQLite store has
gw_ entries, and sessions_history returns "No messages found" for any
gateway session ID. The agent then hallucinates session IDs because
the tool returns empty data.

Switch the registration site at crates/zeroclaw-runtime/src/tools/mod.rs
to SqliteSessionBackend. Both backends implement the SessionBackend
trait with the same new(&Path) signature, so this is a one-line swap
plus a comment update.

Closes zeroclaw-labs#5769.
@singlerider singlerider added bug Something isn't working gateway Auto scope: src/gateway/** changed. labels May 5, 2026
@singlerider singlerider self-assigned this May 5, 2026
@singlerider singlerider added tool Auto scope: src/tools/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. size: XS Auto size: <=80 non-doc changed lines. labels May 5, 2026
The original PR swapped the runtime tools to SqliteSessionBackend
without touching the channel orchestrator, which still hardcoded the
JSONL SessionStore. That trades one half-blindness for another:
sessions written by channels (Telegram, Slack, Discord, ...) become
invisible to the agent's sessions_list / sessions_history tools.

Introduce zeroclaw_infra::make_session_backend(workspace_dir, &str)
that picks JSONL or SQLite based on [channels].session_backend (which
the schema already exposes; nothing was reading it). Both call sites
(channel orchestrator session_store, runtime tool registration) now
go through this factory, so the configured backend is the single
source of truth and channel + WS sessions both surface in the same
store.

The orchestrator's session_store field switches from
Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>>. The
mtime-based hydration sort is replaced with the trait's
list_sessions_with_metadata() last_activity field, which both
backends already populate, so there is no SessionStore-specific
method on the hot path.

Closes zeroclaw-labs#5769
@github-actions github-actions Bot removed gateway Auto scope: src/gateway/** changed. tool Auto scope: src/tools/** changed. labels May 5, 2026
@singlerider singlerider added this to the v0.7.5 milestone May 5, 2026
@singlerider singlerider changed the title fix(runtime/tools): use SQLite backend for session tools fix(runtime, channels): unify session backend behind one factory May 5, 2026
@singlerider singlerider marked this pull request as ready for review May 5, 2026 06:28
Copy link
Copy Markdown
Collaborator

@Audacity88 Audacity88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current PR head eb9d5b6. I checked live PR state, linked issue #5769, overlapping PR #5770, CI status, formal reviews/comments, the full diff, current labels, and the adjacent session backend call sites in runtime, channels, gateway, config, and infra. CI is green, current labels are bug, size: XS, and risk: high, and I did not find prior reviews or inline comments on this PR.

🟢 What looks good — the shared factory is the right direction

This fixes the important limitation in #5770's one-line SQLite swap: channel sessions and runtime session tools should not be hardwired to different stores. Moving the runtime tools and channel orchestrator toward a shared SessionBackend factory is the right shape.

🔴 Blocking — session_backend = "jsonl" still splits gateway sessions from the tools

The PR body says the configured backend becomes the single source of truth and that channel + WS sessions both surface in the same store. That is only true for the default SQLite path.

The gateway still always opens SqliteSessionBackend for /ws/chat persistence, while the runtime tools now open make_session_backend(workspace_dir, config.channels.session_backend). If an operator sets [channels].session_backend = "jsonl", the tools read JSONL, channels read JSONL, but gateway WS sessions are still written to SQLite. That recreates the original #5769 failure for gateway sessions: sessions_list / sessions_history will not see /ws/chat sessions.

Could you either route the gateway through the same backend decision too, or narrow the config contract so jsonl is explicitly channel-only and the runtime tools cannot claim gateway parity under that setting?

🔴 Blocking — existing channel JSONL histories stop hydrating under the default config

On master, the channel orchestrator always opens the JSONL SessionStore when channels.session_persistence is enabled. This PR changes that to make_session_backend(...), whose default is "sqlite". For existing users who never set session_backend, their channel sessions were written as sessions/*.jsonl, but after this change the default channel hydration path reads sessions/sessions.db instead.

I found SqliteSessionBackend::migrate_from_jsonl(...), but this PR does not call it. That means existing channel history can silently disappear from startup hydration and from the session tools after upgrade, unless the operator happens to add session_backend = "jsonl" manually. The PR body's compatibility and rollback notes say no migration is required because each backend reads its own format, but the default behavior changes which format gets read.

Could you add an explicit migration/compatibility path for existing JSONL channel sessions, or preserve the legacy JSONL default for channels until there is a deliberate migration step?

🟡 Warning — explicit JSONL mode loses the old mtime-based hydration ordering

The channel startup hydration used to sort JSONL session keys by file mtime before truncating to MAX_CONVERSATION_SENDERS. This PR replaces that with SessionBackend::list_sessions_with_metadata().last_activity, but SessionStore does not override list_sessions_with_metadata(). It gets the trait default, which sets last_activity = Utc::now() for every key.

For operators who explicitly choose session_backend = "jsonl", startup hydration can now pick an arbitrary subset once there are more than MAX_CONVERSATION_SENDERS persisted sessions. That is smaller than the migration/default issue above, but it is the same compatibility surface: JSONL mode should keep the old mtime ordering or provide real metadata.

🟡 Warning — this should acknowledge the superseded contributor PR

#6384 appears to replace #5770 by @Lior-Dav. It keeps the same linked issue and carries forward the core diagnosis and initial one-line SQLite-tool fix, then expands it into a fuller backend factory. Since #5770 is still open and maintainerCanModify is true, the superseding path should be explicit: either update #5770 directly, or mark #6384 as superseding it and include the appropriate PR-body attribution / co-author handling for any incorporated work.

Copy link
Copy Markdown
Collaborator

@WareWolf-MoonWall WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review draft — PR #6384

fix(runtime, channels): unify session backend behind one factory

Proposed verdict: --comment


Take-stock

Audacity88 holds active CHANGES_REQUESTED (posted 2h ago on head commit eb9d5b6) with
2 🔴 blocking findings and 2 🟡 warnings. No other reviewer has weighed in. I cannot
approve over an active block. I'm adding my own read of the diff, confirming Audacity88's
findings against local source, and raising two new observations.


What I looked at

  • Full diff (orchestrator/mod.rs, zeroclaw-infra/src/lib.rs, zeroclaw-runtime/src/tools/mod.rs)
  • Local source: crates/zeroclaw-gateway/src/lib.rsrun_gateway session init path
  • Local source: crates/zeroclaw-infra/src/session_backend.rslist_sessions_with_metadata default impl
  • Local source: crates/zeroclaw-infra/src/session_sqlite.rsmigrate_from_jsonl
  • Local source: crates/zeroclaw-config/src/schema.rsdefault_session_backend()
  • Open PR #5770 by @Lior-Dav (still open, same linked issue #5769)

🟢 [praise] The factory design is right. A single make_session_backend entry point
that both the channel orchestrator and runtime tools route through is the correct fix for
the root problem in #5769: the session gap was a shared-state boundary problem, not a
per-call-site problem, and the fix belongs at the boundary. The unknown-value fallback
with tracing::warn! is good defensive coding — a config typo can never silently disable
persistence, it just falls back to a safe default with an explanatory warning (per
FND-006 §4.1).


Audacity88's active findings — confirmed

🔴 [blocking — confirmed from source] Gateway split under session_backend = "jsonl"

zeroclaw-gateway/src/lib.rs at the run_gateway session-init path (around L741)
unconditionally opens SqliteSessionBackend::new(&config.workspace_dir) whenever
config.gateway.session_persistence is true. make_session_backend is not wired into
the gateway. So for any operator who sets [channels].session_backend = "jsonl", the
runtime tools and channel orchestrator will read JSONL, but the gateway /ws/chat path
continues to write SQLite. sessions_list / sessions_history will not surface gateway
sessions — the original #5769 failure mode, now on a different backend pairing.

Audacity88's suggested resolution paths are both viable: route the gateway through the
same factory (requires threading the backend decision into run_gateway, whether via
channels.session_backend or a dedicated gateway.session_backend field), or explicitly
narrow the contract so jsonl is documented as channel-only and the PR body removes the
gateway-parity claim for that mode.

🔴 [blocking — confirmed from source] Existing JSONL sessions go dark on upgrade

config/schema.rs defines default_session_backend() returning "sqlite", and
ChannelsConfig::default() sets session_backend: default_session_backend(). Any
deployment without an explicit [channels] session_backend = "jsonl" in their config
will have config.channels.session_backend == "sqlite" — which was always the case,
but before this PR the channel orchestrator ignored that field and always opened
SessionStore (JSONL). This PR makes the field load-bearing: the orchestrator now calls
make_session_backend(..., "sqlite") for the default config. Existing JSONL session
files in sessions/*.jsonl become invisible to startup hydration.

SqliteSessionBackend::migrate_from_jsonl exists and is tested in
session_sqlite.rs — but this PR does not call it. The PR compatibility note
("Backward compatible? Yes for the default config") describes a new-deployment guarantee,
not an upgrade guarantee. For upgrading users on the default config, this is a silent
data-visibility regression.

Resolution options: (a) call migrate_from_jsonl during make_session_backend (or in
start_channels after backend init) when .jsonl files are present in the workspace;
(b) preserve the JSONL default for channels until a deliberate, announced migration step
ships; or (c) make migrate_from_jsonl a documented manual step with a startup warning
when JSONL files are detected in a SQLite-configured workspace.


Audacity88's warnings also hold

🟡 [warning — confirmed] JSONL hydration ordering regresses under explicit jsonl mode

The list_sessions_with_metadata default implementation in session_backend.rs sets
last_activity: Utc::now() for every session key (because SessionStore does not
override this method and the trait default calls self.load(&key) and then timestamps
with Utc::now()). The new hydration sort — sort_by_key(|m| Reverse(m.last_activity))
— therefore produces an arbitrary order for JSONL deployments once more than
MAX_CONVERSATION_SENDERS sessions are persisted. The previous mtime-based ordering
used store.session_mtime(&k), which reads the actual on-disk file modification time.
SessionStore already exposes session_mtime as a concrete method. Implementing
list_sessions_with_metadata on SessionStore using that method would fix the
regression while keeping the trait abstraction intact.

🟡 [warning — confirmed] Supersede attribution for #5770

PR #5770 by @Lior-Dav is still open against the same issue (#5769) and carries the
initial diagnosis and one-line SQLite tool-registration fix that this PR expands into a
full factory. Per the project workflow (AGENTS.md, "Supersedes #" rules), the handling
should be explicit: either mark this PR Supersedes #5770 with Co-authored-by if
code or design was carried forward, or coordinate with @Lior-Dav to update or close
#5770 before this lands. The current state leaves both PRs open on the same issue with
overlapping scope, which creates a merge-ordering ambiguity.


New findings

🟡 [warning — new] make_session_backend has no tests

This is now the single load-bearing factory for both the channel orchestrator and the
runtime tool registration. The three match arms ("jsonl", "sqlite", unknown fallback)
each affect different user configurations, and the fallback behavior (unknown value →
SQLite + warning rather than error) is a deliberate user-visible contract. Unit tests for
the factory — using tempdir as the existing backend tests do — would be appropriate,
especially given that the default behavior matters for upgrade compatibility. A test that
exercises each arm and verifies the returned backend type (or at least that it
successfully loads/saves a session) would close the gap.

🔵 [note] Compatibility claim in the PR body

The "Backward compatible? Yes for the default config (session_backend = 'sqlite')"
note will be read by reviewers as "upgrading users are safe." It means the opposite for
any upgrading user on the default config who has existing JSONL channel sessions.
Once the migration/blocking issue above is resolved, updating this note to reflect the
actual upgrade contract ("yes for new deployments or explicitly-SQLite configs; upgrade
path for existing JSONL sessions requires X") would make the PR body accurate for the
release notes and changelog.


The factory shape is right and the architectural direction is correct per FND-001 §4.1.
Looking forward to the follow-up on the two blocking issues.

@WareWolf-MoonWall
Copy link
Copy Markdown
Collaborator

@JordanTheJet — milestone alignment needed: this PR does not clearly fit within the scope boundary of any open milestone. Please advise on placement or deferral.

…time sort

Per @Audacity88 / @WareWolf-MoonWall review:

1. Gateway `/ws/chat` session persistence now routes through
   `make_session_backend(workspace_dir, &config.channels.session_backend)`
   instead of unconditionally opening `SqliteSessionBackend`. With
   `session_backend = "jsonl"` the gateway, channels, and tools all read
   the same store; the previous gateway-only SQLite hardcode would have
   recreated the original session-split bug under that config.

2. `make_session_backend` opens SQLite via a new
   `open_sqlite_with_jsonl_import` helper that calls
   `SqliteSessionBackend::migrate_from_jsonl` on first open. Legacy
   `sessions/*.jsonl` files get imported and renamed to `*.jsonl.migrated`,
   so an upgrading deployment on the default config doesn't lose channel
   history. Migration errors log + skip rather than blocking startup.

3. `SessionStore` now overrides `list_sessions_with_metadata` to fill
   `last_activity` from `session_mtime` instead of inheriting the trait
   default (which stamped every key with `Utc::now()`). The orchestrator's
   hydration sort now picks the most-recent JSONL sessions deterministically
   when truncating to MAX_CONVERSATION_SENDERS.

4. New tests cover all three `make_session_backend` arms (jsonl, sqlite,
   unknown→sqlite fallback) plus the JSONL→SQLite import flow with the
   `*.jsonl.migrated` rollback file.
Copy link
Copy Markdown
Collaborator

@theonlyhennygod theonlyhennygod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review on eb9d5b657e9337d86a221c7d8037fb384f064369. No prior formal reviews, no inline threads, no top-level comments. Closes #5769. Read the diff against master, the SessionBackend trait + both backends, the schema field, and the orphan-rollback test path.

The bug diagnosis is exact. The factory shape is the right answer.

What's working well 🟢

  • The framing in the body — "Naively swapping the tool registration to SQLite would have traded one half-blindness for another" — is the right diagnosis. The factory exists because the bug isn't "wrong default chosen," it's "two call sites assuming different backends." The fix has to live above both, not at either one.
  • Type-erasing the orchestrator's session_store from Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>> is what makes the rest of the change actually unify anything. Without it the factory would have been window-dressing.
  • SessionMetadata::last_activity was already on the trait and SQLite was already populating it — removing the JSONL-only session_mtime() side call from the hot path and going through the trait is the right shape.
  • Validation evidence cites the exact test that exercises the new shape (rollback_orphan_user_turn_also_removes_from_session_store). I appreciate the literal command list.

🟡 [warning]

1. JSONL hydration order is now arbitrary

The new sort:

let mut metadata = store.list_sessions_with_metadata();
metadata.sort_by_key(|m| std::cmp::Reverse(m.last_activity));

depends on each backend reporting an actual last_activity. SQLite does — SqliteSessionBackend overrides list_sessions_with_metadata (crates/zeroclaw-infra/src/session_sqlite.rs:303-339) and reads it from session_metadata.last_activity.

JSONL doesn't. SessionStore at crates/zeroclaw-infra/src/session_store.rs:157-185 implements only the required trait methods and inherits the default list_sessions_with_metadata from crates/zeroclaw-infra/src/session_backend.rs:65-80, which constructs last_activity: Utc::now() for every key. After this PR, JSONL hydration:

  1. Loads metadata where every entry has the same last_activity (the moment of the call).
  2. Sorts by reverse-last_activity, which is stable in sort_by_key but the key is identical — so ordering falls back to insertion order from read_dir (filesystem-defined, non-deterministic).
  3. Truncates to MAX_CONVERSATION_SENDERS, dropping a non-deterministic subset.

Master's previous code sorted by file mtime, which is at least deterministic and roughly approximates "most recently active." The PR's stated compatibility line — "Operators who set session_backend = "jsonl" continue to read/write JSONL files" — is true for read/write but not for the truncation choice during hydration.

Two clean options:

a. Override list_sessions_with_metadata on SessionStore to populate last_activity from file mtime:

fn list_sessions_with_metadata(&self) -> Vec<SessionMetadata> {
    self.list_sessions()
        .into_iter()
        .map(|key| {
            let last_activity = self.session_mtime(&key)
                .and_then(|t| t.duration_since(std::time::UNIX_EPOCH).ok())
                .and_then(|d| DateTime::<Utc>::from_timestamp(d.as_secs() as i64, 0))
                .unwrap_or_else(Utc::now);
            // ... message_count etc.
        })
        .collect()
}

Keeps session_mtime load-bearing, JSONL hydration stays deterministic.

b. Delete session_mtime and document that JSONL backend has unspecified hydration order under MAX_CONVERSATION_SENDERS truncation. Reasonable because JSONL is legacy/secondary, but it should be called out so an operator who was relying on mtime ordering knows it's gone.

Either works. Currently the diff implies (a) is true (hydration sort is the same shape) but actually ships (b)'s behavior on JSONL.

2. session_mtime is dead code after this lands

crates/zeroclaw-infra/src/session_store.rs:134 is pub fn session_mtime(...). After this PR, the only caller (orchestrator/mod.rs:5801) goes away. pub fn with no callers will trigger dead_code once warnings are treated as errors elsewhere in the workspace. If you take option (1a) above, this becomes load-bearing again. If you take (1b), delete it in the same commit so the cleanup is atomic.

🔵 [suggestion]

3. Factory has no tests

make_session_backend has three meaningful branches:

  • "jsonl" returns a SessionStore-backed Arc<dyn SessionBackend>
  • "sqlite" returns a SqliteSessionBackend-backed one
  • anything else logs a warning and falls back to SQLite

A 15-line #[cfg(test)] block covering all three (plus assertion that the fallback case actually constructs a working backend, not just returns Ok) would lock the contract. Right now if someone changes the fallback default to JSONL the schema-mismatch wouldn't fail any test.

4. Fallback-on-unknown is a policy choice worth surfacing

The doc comment says "Unknown values fall back to SQLite with a warning so a typo in config never silently disables persistence." That's defensible — silent disable is the worst failure mode. But the alternative is rejecting at config validation time so the operator sees session_backend = "sqlte" flagged on startup before it falls back to a different backend than they configured. The current shape lets a typo silently change which on-disk format the operator's data lands in.

If config validation is supposed to be strict, this should bail! at startup with a list of valid values. If silent fallback-with-warning is the intended behavior, the doc comment is fine as-is — just worth being explicit that this was a chosen policy, not an accident. Consider adding a sentence like "Treat as policy: typos silently fall through rather than failing startup, on the principle that disabled persistence is worse than a wrong-but-working backend."

Summary

JSONL hydration regression in (1) is the only thing I'd want addressed before merge — everything else is hygiene. The bug fix itself is clean, the architectural shape is right, and the test coverage that already exists exercises the orphan-rollback path through the trait. Once JSONL ordering is either fixed or explicitly accepted as deferred, this is straightforward to approve.

Milestone alignment: v0.7.5 fits cleanly per the milestone's "release / web polishing / runtime stability" scope.

…htly rustfmt

CI's Lint job runs nightly rustfmt; my local stable rustfmt 1.8.0 left
the long assert!/assert_eq! lines and a use-ordering nit untouched, so
the previous push failed the format check on the merge.

cargo +nightly fmt --all is clean now.
@singlerider singlerider marked this pull request as draft May 6, 2026 07:16
@singlerider singlerider marked this pull request as ready for review May 6, 2026 07:17
Copy link
Copy Markdown
Collaborator

@WareWolf-MoonWall WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — #6384 fix(runtime, channels): unify session backend behind one factory

Verdict: --approve
Reviewer: @WareWolf-MoonWall
Head reviewed: aca9d87 (current HEAD; checks passing)
Prior review head: eb9d5b6 (COMMENTED)


Take-stock

@Audacity88's CHANGES_REQUESTED (at eb9d5b6) is DISMISSED. My prior COMMENTED review is not a block. theonlyhennygod's COMMENTED review raised the same JSONL ordering regression I identified — the diff resolves it. No active CHANGES_REQUESTED. Reviewing the fixed implementation fresh.


Prior findings — resolution status

✅ [resolved] 🔴 Gateway (run_gateway ~L741) hardcodes SqliteSessionBackend::new()

The run_gateway session-init path now calls:

zeroclaw_infra::make_session_backend(&config.workspace_dir, &config.channels.session_backend)

SqliteSessionBackend is no longer directly imported in zeroclaw-gateway/src/lib.rs. The gateway now honours the same [channels].session_backend config field as the channel orchestrator and runtime tools. Setting session_backend = "jsonl" no longer creates the original #5769 failure mode on a different backend pairing.

✅ [resolved] 🔴 Existing JSONL sessions go dark on upgrade

open_sqlite_with_jsonl_import calls backend.migrate_from_jsonl(workspace_dir) immediately after opening the SQLite backend. Sessions are imported, each .jsonl file is renamed to .jsonl.migrated (rollback-recoverable), and migration errors are logged and skipped — the backend still opens. The startup is never blocked on a best-effort migration. The compatibility section in the PR body now accurately describes the upgrade path for default-config deployments.

✅ [resolved] 🟡 JSONL list_sessions_with_metadata sort regresses

SessionStore now overrides list_sessions_with_metadata using session_mtime. The implementation:

fn list_sessions_with_metadata(&self) -> Vec<SessionMetadata> {
    self.list_sessions().into_iter().map(|key| {
        let last_activity = self.session_mtime(&key)
            .map(DateTime::<Utc>::from)
            .unwrap_or_else(Utc::now);
        SessionMetadata { last_activity, key, ... }
    }).collect()
}

JSONL hydration now sorts by actual file mtime rather than the Utc::now() fallback from the trait default. The pre-PR ordering semantics are preserved. session_mtime remains a live, load-bearing method — no dead code concern.

✅ [resolved] 🟡 Supersede attribution for #5770

The PR body includes Supersedes #5770 and the commit carries Co-authored-by: Lior-Dav. @Lior-Dav's diagnosis and the initial one-line tool-registration fix that this PR expands are credited on the record. No overlapping open PRs on the same issue.

✅ [resolved] 🟡 make_session_backend has no tests

Four tests now cover all three factory arms plus the migration path:

  • make_session_backend_jsonl_round_trips_through_session_store
  • make_session_backend_sqlite_round_trips_through_sqlite_db
  • make_session_backend_unknown_value_falls_back_to_sqlite
  • make_session_backend_sqlite_imports_legacy_jsonl_on_first_open

The fallback-on-unknown behavior (SQLite + warning rather than startup error) is locked by the third test. The migration path is exercised end-to-end in the fourth, including the .jsonl.migrated rename assertion.

✅ [resolved] 🔵 Compatibility claim "Yes for the default config" misleads upgrading users

The compatibility section now clearly distinguishes new deployments, upgrading JSONL deployments, and explicit jsonl operators. The upgrade path is documented accurately.


What's working well

🟢 Factory architecture is the right fix for the root cause

The session gap in #5769 was a shared-state boundary problem — two call sites assuming different backends — not a per-call-site wrong default. make_session_backend solves it at the boundary. A future call site (a new gateway mode, an additional tool) picks up the correct backend automatically just by calling the factory. The fix has the right shape.

🟢 Error handling on migration follows FND-006 §4.1

migrate_from_jsonl errors are logged via tracing::warn! and skipped. The backend still opens. This is the correct failure mode: blocking startup on a best-effort migration would be worse than a partial migration. The log message includes the recovery instruction (switch to session_backend = "jsonl" if you need them visible immediately), which is what FND-006 §4.1 asks for — errors explain the path forward.

🟢 Type erasure in the orchestrator is load-bearing

Changing session_store from Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>> is what makes the factory actually unify anything. Without it, make_session_backend would have been window dressing. The type change is what forces the fix to hold across future refactors.


Summary

All prior findings from my COMMENTED review at eb9d5b6 are resolved. The factory is correctly wired across the channel orchestrator, runtime tools, and gateway. JSONL hydration ordering is preserved. Migration runs automatically on first open. Tests cover all contract-relevant branches. CI is green.

Approving.

@theonlyhennygod theonlyhennygod merged commit c6a0ed7 into zeroclaw-labs:master May 6, 2026
12 checks passed
github-actions Bot pushed a commit that referenced this pull request May 6, 2026
…ory (#6384)

* fix(runtime/tools): use SQLite backend for session tools

The gateway WebSocket handler persists sessions to SqliteSessionBackend
(sessions/sessions.db, with gw_-prefixed keys) and the gateway REST API
reads from the same place. The session tools (sessions_list,
sessions_history, sessions_send) were initialized with the JSONL
SessionStore, so they looked at sessions/*.jsonl files the gateway
never writes.

Result: gateway sessions are invisible to the agent. sessions_list
returns "No active sessions found" even when the SQLite store has
gw_ entries, and sessions_history returns "No messages found" for any
gateway session ID. The agent then hallucinates session IDs because
the tool returns empty data.

Switch the registration site at crates/zeroclaw-runtime/src/tools/mod.rs
to SqliteSessionBackend. Both backends implement the SessionBackend
trait with the same new(&Path) signature, so this is a one-line swap
plus a comment update.

Closes #5769.

* fix(runtime, channels): route session backend through one factory

The original PR swapped the runtime tools to SqliteSessionBackend
without touching the channel orchestrator, which still hardcoded the
JSONL SessionStore. That trades one half-blindness for another:
sessions written by channels (Telegram, Slack, Discord, ...) become
invisible to the agent's sessions_list / sessions_history tools.

Introduce zeroclaw_infra::make_session_backend(workspace_dir, &str)
that picks JSONL or SQLite based on [channels].session_backend (which
the schema already exposes; nothing was reading it). Both call sites
(channel orchestrator session_store, runtime tool registration) now
go through this factory, so the configured backend is the single
source of truth and channel + WS sessions both surface in the same
store.

The orchestrator's session_store field switches from
Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>>. The
mtime-based hydration sort is replaced with the trait's
list_sessions_with_metadata() last_activity field, which both
backends already populate, so there is no SessionStore-specific
method on the hot path.

Closes #5769

* fix(sessions): wire gateway through factory, migrate JSONL, restore mtime sort

Per @Audacity88 / @WareWolf-MoonWall review:

1. Gateway `/ws/chat` session persistence now routes through
   `make_session_backend(workspace_dir, &config.channels.session_backend)`
   instead of unconditionally opening `SqliteSessionBackend`. With
   `session_backend = "jsonl"` the gateway, channels, and tools all read
   the same store; the previous gateway-only SQLite hardcode would have
   recreated the original session-split bug under that config.

2. `make_session_backend` opens SQLite via a new
   `open_sqlite_with_jsonl_import` helper that calls
   `SqliteSessionBackend::migrate_from_jsonl` on first open. Legacy
   `sessions/*.jsonl` files get imported and renamed to `*.jsonl.migrated`,
   so an upgrading deployment on the default config doesn't lose channel
   history. Migration errors log + skip rather than blocking startup.

3. `SessionStore` now overrides `list_sessions_with_metadata` to fill
   `last_activity` from `session_mtime` instead of inheriting the trait
   default (which stamped every key with `Utc::now()`). The orchestrator's
   hydration sort now picks the most-recent JSONL sessions deterministically
   when truncating to MAX_CONVERSATION_SENDERS.

4. New tests cover all three `make_session_backend` arms (jsonl, sqlite,
   unknown→sqlite fallback) plus the JSONL→SQLite import flow with the
   `*.jsonl.migrated` rollback file.

* style(infra): rewrap make_session_backend test asserts to satisfy nightly rustfmt

CI's Lint job runs nightly rustfmt; my local stable rustfmt 1.8.0 left
the long assert!/assert_eq! lines and a use-ordering nit untouched, so
the previous push failed the format check on the merge.

cargo +nightly fmt --all is clean now. c6a0ed7
github-actions Bot pushed a commit to aliasliao/zeroclaw-contribute that referenced this pull request May 6, 2026
…ory (zeroclaw-labs#6384)

* fix(runtime/tools): use SQLite backend for session tools

The gateway WebSocket handler persists sessions to SqliteSessionBackend
(sessions/sessions.db, with gw_-prefixed keys) and the gateway REST API
reads from the same place. The session tools (sessions_list,
sessions_history, sessions_send) were initialized with the JSONL
SessionStore, so they looked at sessions/*.jsonl files the gateway
never writes.

Result: gateway sessions are invisible to the agent. sessions_list
returns "No active sessions found" even when the SQLite store has
gw_ entries, and sessions_history returns "No messages found" for any
gateway session ID. The agent then hallucinates session IDs because
the tool returns empty data.

Switch the registration site at crates/zeroclaw-runtime/src/tools/mod.rs
to SqliteSessionBackend. Both backends implement the SessionBackend
trait with the same new(&Path) signature, so this is a one-line swap
plus a comment update.

Closes zeroclaw-labs#5769.

* fix(runtime, channels): route session backend through one factory

The original PR swapped the runtime tools to SqliteSessionBackend
without touching the channel orchestrator, which still hardcoded the
JSONL SessionStore. That trades one half-blindness for another:
sessions written by channels (Telegram, Slack, Discord, ...) become
invisible to the agent's sessions_list / sessions_history tools.

Introduce zeroclaw_infra::make_session_backend(workspace_dir, &str)
that picks JSONL or SQLite based on [channels].session_backend (which
the schema already exposes; nothing was reading it). Both call sites
(channel orchestrator session_store, runtime tool registration) now
go through this factory, so the configured backend is the single
source of truth and channel + WS sessions both surface in the same
store.

The orchestrator's session_store field switches from
Option<Arc<SessionStore>> to Option<Arc<dyn SessionBackend>>. The
mtime-based hydration sort is replaced with the trait's
list_sessions_with_metadata() last_activity field, which both
backends already populate, so there is no SessionStore-specific
method on the hot path.

Closes zeroclaw-labs#5769

* fix(sessions): wire gateway through factory, migrate JSONL, restore mtime sort

Per @Audacity88 / @WareWolf-MoonWall review:

1. Gateway `/ws/chat` session persistence now routes through
   `make_session_backend(workspace_dir, &config.channels.session_backend)`
   instead of unconditionally opening `SqliteSessionBackend`. With
   `session_backend = "jsonl"` the gateway, channels, and tools all read
   the same store; the previous gateway-only SQLite hardcode would have
   recreated the original session-split bug under that config.

2. `make_session_backend` opens SQLite via a new
   `open_sqlite_with_jsonl_import` helper that calls
   `SqliteSessionBackend::migrate_from_jsonl` on first open. Legacy
   `sessions/*.jsonl` files get imported and renamed to `*.jsonl.migrated`,
   so an upgrading deployment on the default config doesn't lose channel
   history. Migration errors log + skip rather than blocking startup.

3. `SessionStore` now overrides `list_sessions_with_metadata` to fill
   `last_activity` from `session_mtime` instead of inheriting the trait
   default (which stamped every key with `Utc::now()`). The orchestrator's
   hydration sort now picks the most-recent JSONL sessions deterministically
   when truncating to MAX_CONVERSATION_SENDERS.

4. New tests cover all three `make_session_backend` arms (jsonl, sqlite,
   unknown→sqlite fallback) plus the JSONL→SQLite import flow with the
   `*.jsonl.migrated` rollback file.

* style(infra): rewrap make_session_backend test asserts to satisfy nightly rustfmt

CI's Lint job runs nightly rustfmt; my local stable rustfmt 1.8.0 left
the long assert!/assert_eq! lines and a use-ordering nit untouched, so
the previous push failed the format check on the merge.

cargo +nightly fmt --all is clean now. c6a0ed7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working risk: high Auto risk: security/runtime/gateway/tools/workflows. size: XS Auto size: <=80 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session tools (sessions_list/sessions_history) use JSONL backend instead of SQLite — returns empty for gateway sessions

4 participants